Co-evolutionary rates of functionally related yeast genes.

Evolutionary knowledge is often used to facilitate computational attempts at gene function prediction. One rich source of evolutionary information is the relative rates of gene sequence divergence, and in this report we explore the connection between gene evolutionary rates and function. We performed a genome-scale evaluation of the relationship between evolutionary rates and functional annotations for the yeast Saccharomyces cerevisiae. Non-synonymous (dN) and synonymous (dS) substitution rates were calculated for 1,095 orthologous gene sets common to S. cerevisiae and six other closely related yeast species. Differences in evolutionary rates between pairs of genes (DeltadN & DeltadS) were then compared to their functional similarities (sGO), which were measured using Gene Ontology (GO) annotations. Substantial and statistically significant correlations were found between DeltadN and sGO, whereas there is no apparent relationship between DeltadS and sGO. These results are consistent with a mode of action for natural selection that is based on similar rates of elimination of deleterious protein coding sequence variants for functionally related genes. The connection between gene evolutionary rates and function was stronger than seen for phylogenetic profiles, which have previously been employed to inform functional inference. The co-evolution of functionally related yeast genes points to the relevance of specific function for the efficacy of natural selection and underscores the utility of gene evolutionary rates for functional predictions.

Many post-genomic research efforts are aimed at uncovering relationships among genes, and the yeast Saccharomyces cerevisiae has served as a model system for such investigations (Cherry et al. 1998). A particular emphasis has been placed on high-throughput experimental attempts to elucidate various kinds of interactions between pairs of genes (or proteins), such as physical protein-protein interactions (Krogan et al. 2006), synthetic lethal gene pairs (Tong et al. 2004) and regulatory interactions between transcription factors and promoters (Harbison et al. 2004). The characterization of such relationships has the potential to reveal important clues as to the function of individual genes. Perhaps even more importantly, this line of inquiry can reveal higher order relationships, which defi ne groups of genes that function as integrated biological systems (Ideker et al. 2001).
In addition to the kinds of experimental approaches mentioned above, computational analyses have also been brought to bear on the discovery of functional relationships between genes. These include classic information transfer techniques that rely on sequence similarity searches, using BLAST (Altschul et al. 1997) for instance, as well as more recently developed techniques that seek to exploit information on the co-occurrence of genes in different organisms (Pellegrini et al. 1999). What many of these computational approaches share in common is a reliance, either implicit or explicit, on evolutionary information. Information transfer via BLAST rests on the fact that molecular evolution is a conservative process marked by the preservation of biochemical function among related genes. Phylogenetic profi le methods, which evaluate patterns of gene presence and absence across sets of species, work because natural selection tends to maintain functionally related genes as coherent sets within evolutionary lineages.
In this manuscript, we report an attempt to assess the utility of an additional source of evolutionary information for functional inference, namely the relative rates of gene evolution. Our approach is based on a growing body of literature that points to the connections between various phenotypic aspects of genes and their rates of evolution (Wall et al. 2005;Wolf et al. 2006). Among other fi ndings, these studies have uncovered co-evolutionary connections between particular phenotypes and rates gene of evolution. For instance, genes that encode physically interacting proteins tend to evolve at similar rates (Fraser et al. 2002) as do genes that are co-expressed across similar tissue Evolutionary Bioinformatics 2006: 2 271-276 types (Jordan et al. 2004). It stands to reason that, as a general principle, genes with similar functional affi nities should have similar (average) rates of evolution. We set out to test this notion by comparing the relative rates of evolution between orthologs, detected for S. cerevisiae and six closely related yeast species, with their Gene Ontology (GO) functional annotations.
1,095 sets of orthologous yeast genes were identified by using all-against-all reciprocal BLASTP searches (e -10 ) between S. cerevisiae and six closely related species with complete wholegenome draft sequences (Cliften et al. 2003;Kellis et al. 2003): S. paradoxus, S. mikatae, S. kudriavzevii, S. bayanus, S. castelli and S. kluyveri. Protein sequences of each orthologous set were aligned using ClustalW (Thompson et al. 1994), and the protein alignments were used to guide inframe alignments of the corresponding DNA protein coding sequences. For each set of 7 aligned orthologous genes, pairwise non-synonymous (dN) and synonymous (dS) substitution rates were computed between S. cerevisiae and each of the other six species using the modifi ed Nei-Gojobori method (Nei and Gojobori 1986) implemented in the program PAML (Yang 1997). The resulting evolutionary distance values were used to calculate pairwise distance differences (∆dN & ∆dS) between S. cerevisiae genes, across each of the six betweenspecies comparisons. Specifi cally, for any pair of S. cerevisiae genes i & j: ∆dN ij = |dN i -dN j | and ∆dS ij = |dS i -dS j |. This approach allowed us to evaluate the differences in evolutionary distances for pairs of genes over a range of phylogenetic distances from S. cerevisiae.
A modifi ed version of the semantic similarity method (Lord et al. 2003) was used to quantitatively assess the functional relationships between S. cerevisiae genes. Functional similarity coefficients between pairs of GO biological process terms -s(c k , c p ) -were calculated by using an information content based approach. This approach takes into account both the frequency of biological process GO terms in the Saccharomyces Genome Database (SGD -http:// www.yeastgenome.org/) and the structure of the GO directed acyclic graph (DAG). The DAG was used to relate query terms by their closest parent term -i.e. the lowest common subsumer (lcs). For any term (c i ), its information content -ln p(c i ) -is a function of its number of occurrences normalized by the total number of occurrences of all GO biological process terms in the SGD. Term-term functional similarities were measured using the information content of the query terms -ln p(c k ) & ln p(c p ) -and their lowest common subsumer parent term -ln p lcs (c k , c p ) (Lin, 1998): For any gene pair ij, all term-term similarity values were aggregated at the level of gene products to yield sGO ij by using the average highest similarity aggregation scheme as follows (Azuaje et al. 2005). Given m and n distinct GO terms for each gene in the pair ij, Thus, we were able to quantify functional similarities as well as evolutionary rate differences for all pairwise relationships among the 1,095 orthologous S. cerevisiae genes. We then compared function with evolutionary rate to determine whether functionally related genes have more similar evolutionary rates on average. Gene pairs were sorted in ascending order according to the pairwise distance difference (∆dN & ∆dS), grouped into 10 bins, and average binned distance differences as well as average functional similarities (sGO) were calculated. For all six between-species comparisons, a clear linear trend exists between ∆dN and sGO (Figure 1), whereby ∆dN is negatively correlated with sGO ( Figure 2a). Five out of the six ∆dN-sGO correlations are statistically signifi cant at P < 0.01 (Figure 2b). In other words, genes that are more functionally similar tend to have smaller non-synonymous distance differences, on average, than genes with increasingly different functions. The only ∆dN-sGO correlation that was not significant was observed for the comparison between S. cerevisiae and S. paradoxus (Figure 2b). Among the six species we analyzed, S. paradoxus is the most closely related to S. cerevisiae; therefore, the lack of signifi cance for this particular pair probably refl ects the low resolution afforded by the small evolutionary distances between the two species. Consistent with this interpretation, the strength of the ∆dN-sGO negative correlation, as well as its statistical significance, tends to increase together with the distance between the species being compared (Figure 2). ∆dS, on the other hand, shows virtually no correlation with sGO. The magnitudes of the ∆dS-sGO correlations are uniformly lower than seen for ∆dN; the slopes of the trend lines are notably shallower, and the signs of the correlation coeffi cients and trend line slopes both fl uctuate between positive and negative (Figure 1 and Figure 2).
In summary, genes with similar functions tend to have similar non-synonymous evolutionary rates, on average, while synonymous substitution rates show no such relationship with function. This is not surprising given the fact that nonsynonymous substitutions, which change the encoded amino acid, have a more profound effect on protein structure and function than synonymous substitutions, which do not result in an amino acid change. Natural selection operates based on function and, at the molecular level, acts primarily to remove deleterious protein coding sequence variants. Nevertheless, the distinction between the patterns observed for ∆dN and ∆dS underscores a demonstrable connection between the particular effects of natural selection and the specifi c annotated function of yeast genes.
Phylogenetic profi les have also been successfully employed to guide computationally based functional inferences, under the assumption that functionally related genes will have similar patterns of presence and absence across different species. We sought to compare the relationships between phylogenetic profiles and the same GO-based semantic measure of functional similarity that we found to be related to non-synonymous evolutionary rates. The phylogenetic profi les we analyzed are binary presence (1) and absence (0)  these profi les were compared here using Jaccard and Hamming similarity measures. As with the evolutionary rates, phylogenetic profi le similarities were binned in ascending order, and average sGO values were compared to average profi le similarities. All three comparisons yield a positive correlation between profile and functional similarity (Figure 3). In other words, genes that are functionally related tend to have more similar evolutionary histories in terms of gene gain and loss. However, the magnitude and signifi cance of this effect was not nearly as strong as seen for the comparison between function and evolutionary rate. In fact, t h e M a r c o t t e p r o f i l e s d i d n o t y i e l d a significantly positive correlation with sGO ( Figure 3a). This may be attributable to the relative sparseness of this dataset; only ~3,000 profi le comparisons over 16 species were available compared to >500,000 comparisons over 71 species for the COG data set. Indeed, COG based profi les were signifi cantly correlated with sGO for the Jaccard similarity measure but not when Hamming similarities were used (Figure 3b and c). The different results observed for the Jaccard and Hamming measures refl ects that fact that most binary phylogenetic profi les contain many absent (0) signals, and too many of these will dominate the Hamming measure, which simply counts all positions as similar or different. The Jaccard measure attains more sensitivity by ignoring vector positions that are scored as absent for both genes. Even in this case though, the strength of the correlation is not as great as typically observed for ∆dN-sGO.
We have demonstrated that functionally related yeast genes co-evolve with respect to the evolutionary rate at non-synonymous coding sequence positions. This effect is observed to be highly signifi cant for all but the most closely related species comparison. For the data analyzed here, the correlation between function and evolutionary rate is stronger than seen for function and phylogenetic profi les. Rates of gene evolution are, for the most part, determined by the strength of purifying natural selection, which involves the removal of deleterious variants. As such, the results that we report here point to a close coupling between the particular function of a gene and the effi cacy of purifying selection. The robust correlations between ∆dN-sGO also indicate that evolutionary rate comparisons can be used aid functional inference and prediction. used in this analysis: i-Marcotte group profi les (Pellegrini et al. 1999) and ii-COG database profi les (Tatusov et al. 2003). The Marcotte profi les were based on an evaluation of 16 species, and the similarities between profi les were scored using a loglikelihood ratio as previously described ). The COG profi les were based on the presence and absence of orthologs among 71 species, and